Determining the Unithood of Word Sequences Using a Probabilistic Approach
نویسندگان
چکیده
Most research related to unithood were conducted as part of a larger effort for the determination of termhood. Consequently, novelties are rare in this small sub-field of term extraction. In addition, existing work were mostly empirically motivated and derived. We propose a new probabilistically-derived measure, independent of any influences of termhood, that provides dedicated measures to gather linguistic evidence from parsed text and statistical evidence from Google search engine for the measurement of unithood. Our comparative study using 1, 825 test cases against an existing empiricallyderived function revealed an improvement in terms of precision, recall and accuracy.
منابع مشابه
Determining the Unithood of Word Sequences using Mutual Information and Independence Measure
Most works related to unithood were conducted as part of a larger effort for the determination of termhood. Consequently, the number of independent research that study the notion of unithood and produce dedicated techniques for measuring unithood is extremely small. We propose a new approach, independent of any influences of termhood, that provides dedicated measures to gather linguistic eviden...
متن کاملMulti-granulation fuzzy probabilistic rough sets and their corresponding three-way decisions over two universes
This article introduces a general framework of multi-granulation fuzzy probabilistic roughsets (MG-FPRSs) models in multi-granulation fuzzy probabilistic approximation space over twouniverses. Four types of MG-FPRSs are established, by the four different conditional probabilitiesof fuzzy event. For different constraints on parameters, we obtain four kinds of each type MG-FPRSs...
متن کاملارائه یک مدل احتمالاتی جهت تعیین انسجام متن در سیستم های پرسش و پاسخ تعاملی
Evaluation plays an important role in interactive question answering systems like many computational linguistics fields. The coherence between the questions and the answers exchanged between the user and the system is one of the important criteria in evaluating these systems. In this paper, a new approach to determine the degree of coherence of generated text by the IQA systems is presented. Th...
متن کاملUsing it Bundles in Published and Unpublished Writings
Lexical bundles are known as important elements of coherent discourse that have been the subject of much research. While the previous research has been mainly concerned with exploring variations in the use of these word sequences across different registers and disciplines, very few studies have addressed the use of some particular groups of lexical bundles within some types of academic writing....
متن کاملA Study on Terminology Extraction Based on Classified Corpora
Algorithms for automatic term extraction in a specific domain should consider at least two issues, namely Unithood and Termhood(Kageura,1996). Unithood refers to the degree of a string to occur as a word or a phrase. Termhood (Chen Yirong, 2005) refers to the degree of a word or a phrase to occur as a domain specific concept. Unlike unithood, study on termhood is not yet widely reported. In cla...
متن کامل